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I. INRODUCTION 

Given an arbitrary collection of numbers, can one pre¬ 
dict what their distribution will be? It would be fanciful 
to think so, but consider the following rationale. Any ex¬ 
periment in nature involves partitioning some part of the 
universe (in mass, energy, volume, or some other phys¬ 
ical quantity) from the rest. The sizes of the resulting 
numbers are then subject to the conservation laws of 
physics (mass-energy, momentum, angular momentum). 
Indeed, in statistical physics there is a fundamental dis¬ 
tribution of energy E, the celebrated Boltzmann distribu¬ 
tion , where T is the temperature and k is Boltz¬ 

mann’s constant. This distribution, while fully justified 
only in certain thermodynamic limits [T] , is an extraordi¬ 
narily powerful tool for analyzing many systems. Could 
there be such a fundamental distribution of numbers? 

Surprisingly, many distributions in nature, economics, 
and sociology, such as Zipf’s law, follow power-law dis¬ 
tributions rather than the exponential Boltzmann distri¬ 
bution. Such power laws have inspired a wide variety 
of explanations and arguments over the years [2]. Most 
recently, arguments based on information theory known 
as random group [3] or community [3] formation have 
shown how long-tailed distributions with general power 
laws can be derived from a small parameter model. These 
are written in terms of a maximum entropy principle [5] 
given simple constraints. In physics, however, such a pro¬ 
cedure [5], using the constraint of energy conservation, 
leads to the Boltzmann distribution. How then could a 
physical conservation law lead to a power-law distribu¬ 
tion? 

In this paper we argue that, under certain conditions 
similar to energy conservation, there is indeed a univer¬ 
sal distribution that is intimately related to the Boltz¬ 
mann distribution for quantum particles. In an appropri¬ 
ate limit we call the equipartition limit, this distribution 
tends to the simplest inverse power law. We further pro¬ 
vide a concrete combinatorial proof of this limit, and ver- 
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ify it against numerical simulations. Most importantly, 
we show how this limit has, as a simple consequence, Ben¬ 
ford’s law for the leading digit distribution [7]. A data 
set is Benford if the probability of observing a first digit 
of d in 1, 2, ..., 9 is logio(l-|-l/d). This property has been 
observed in a wide variety of data sets from economics, 
sociology, mathematics, physics, geology, among others 
[5]. While many mathematical processes are known to 
exhibit the Benford property [9], we believe that this ar¬ 
gument, originally due to Lemons m, is one of the sim¬ 
plest. In short, a conservation law implies a power law 
that directly leads to Benford’s law. 

This paper is organized as follows. We begin in Sec¬ 
tion II by showing how the principle of maximum en¬ 
tropy, when applied to the partition of numbers, leads to 
a power law for the average number of parts of a given 
size. This is extended in Section III, in which the the¬ 
ory of partitions is used to justify this result, in an ap¬ 
propriate limit. The implication of this power law for 
Benford’s law is presented in Section IV, while an exten¬ 
sion to more general power laws is presented in Section 
V. We conclude in Section VI, and provide additional 
mathematical details in the Appendix. 

II. POWER LAW FROM MAXIMUM ENTROPY 

The main topic of this paper is the distribution of parts 
subject to an overall conservation law. In more detail, we 
consider the distribution of N numbers Uj , corresponding 
to piece sizes Xj, so that the total pieces add up to some 
given quantity X: 

N 

i=i 

Here the part set {xi,X 2 , ■ ■ ■, is fixed, but the num¬ 
ber rij of parts of a given size Xj is not. The numbers 
rij specify a partition of X, which could result from a 
fragmentation process, as might occur in nuclear physics 
m- We consider the set of of all such partitions of a 
quantity X, subject to Eq. 0. The distribution we seek 
corresponds to the average number of parts {rij) of size 
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Xj, when all partitions can occur with equal probabil¬ 
ity. The essence of this argument was originally given by 
Lemons m, using heuristic arguments for a continuous 
set of parts. In this section we will derive the probability 
distribution and average number for a discrete part set by 
using methods from statistical physics, namely Jaynes’s 
principle of maximum entropy [5]. A similar approach, 
specific to the fragmentation of solids, can be found in 
[12| . while an alternative application of maximum en¬ 
tropy to Benford’s law can be found in [13]. 

In this formulation, we look for the probability distri¬ 
bution p{n) for finding ni pieces of size xi, n 2 pieces of 
size X 2 , etc., that maximizes the entropy 


S = - ^p{n) \np{n) - a ^p{n) - 1 




^UjXj-X 


( 2 ) 


where ft is the vector of integers (ni,n 2 ,..., njv). Maxi¬ 
mizing the entropy, we find 


N 


i=i 




( 3 ) 


where /3 is a Lagrange multiplier, to be specified below. 
This result conforms with the usual expectation that the 
distribution associated with a conserved quantity is ex¬ 
ponential. 

Given the probability distribution for n, we now con¬ 
sider how frequently each part Xj occurs. That is, when 
we observe a given partitioning of a system, we find a 
number of parts ni of size xi, n 2 of size X 2 , u-s of size 
X 3 , etc. The distribution of observed part sizes (or frag¬ 
ments) Xj will be proportional to the number of occur¬ 
rences of a given part Xj, namely rij. Thus, we compute 
the expectation value of Uj as a function of Xj: 

(n,) = ^ n,p(n) = (4) 

n 


where the Lagrange multiplier is found by the conserva¬ 
tion equation 

(5) 

j 3 


This form of the fragment distribution is formally equiva¬ 
lent to the average number of quanta for a set of harmonic 
oscillators with energies Xj = hujj and inverse tempera¬ 
ture /3 = 1/fcT [T|. 

In the limit when A ^ 1, as is usually the case in real- 
world data, we expect that /3 ^ 1; this corresponds to 
a high-temperature limit. In this case we can solve the 
conservation equation perturbatively in j5 to find /3 « 
N/X and thus 




1 

Pxj 


X 

Nxj' 


( 6 ) 
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FIG. 1: Average number of parts {rij) for {a:^} = 

{1,2,3,4,5,6,7,8,9,10}, for various values of A = 
{25,50,100,200,400,800} (from bottom to top). The dots 
are exact calculations and the solid lines the maximum en¬ 
tropy result using Eq. Also shown is the equipartition 

result of Eq. for A = 1600 (dashed). 


We call this the equipartition limit, by analogy with the 
high-temperature limit for quantum harmonic oscillators, 
in which each oscillator has the same average energy 
{nj)hiOj = kT. 


This derivation provides solid evidence for Lemons’ 
original argument [lOj that the number of parts of size 
X, when subject to a conservation law of the form Eq. 
Q, satisfies the power law n{x) ~ 1/x. The approach 
given here also applies to more general partitions, includ¬ 
ing those with a continuous set of parts. It can also be 
tested numerically. Using the part set (1, 2,..., 10}, we 
have calculated the exact set of partitions for values of 
A between 25 and 800 (the latter with over 100 trillion 
partitions) and the corresponding average values for (uj). 
The results, shown in Fig. 1, are well described by the 
maximum entropy result Eq. provided we numeri¬ 
cally solve Eq. ([^ for /3. Final y, these results converge 
to the equipartition result Eq. (§ for large A. 


III. PARTITION NUMBER CALCULATION 

In the previous section we presented what could be 
termed a “canonical ensemble” calculation of the frag¬ 
ment distribution n{x). Such a calculation applies to the 
behavior of a set of systems for which the conservation 
law holds on average. An alternative calculation uses the 
“microcanonical ensemble”, a set of systems for which 
the conservation law holds exactly [Tj . In this section we 
consider such a calculation of the equipartition limit of 
Eq. ([^, using the theory of integer partitions [Hj. In 
this framework we can find the average number of parts 
by exactly averaging over all partitions of the conserved 
quantity. 

We note that unrestricted partition problems have 
been used previously to analyze fragmentation of nu¬ 
clei dlKIMHI- By contrast, here we consider the re- 





















3 


stricted partition number Ph{X)^ that is, the number of 
ways to partition an integer X into the set of integers 
H = {xi = 1, X 2 , ■ ■ ■ ,xn}, as in Eq. 0. Note that set¬ 
ting xi = 1 ensures that a partition will exist for every 

X. 

We begin by introducing the generating function 

OO 

Y.PH{k)q^= 11 ( 1 - 9 ")-'. ( 7 ) 

k—Q x^H 


from which any given number can be obtained by multi¬ 
ple differentiation: 


Ph{X) 



( 8 ) 


Evaluating this partition number is a hard problem, but 
useful approximations HHHO] and bounds Eiiiia exist. 

The average number of parts of size Xj can be found 
by manipulating the partition functions Ph{X) and 
PniX^rij): 

^ Ph{X) ^ ^jPHiX;nj), (9) 

Tlj 

where P}i{X]nj) is the number of partitions of X with 
exactly rij parts of size Xj. Using its generating function. 


= (10) 

k—0 


and performing the sum over rij, 

1 / d '\ ^ 

J2n,PH{X;n,) = - 

rij 


we have 



11(1-^ 

xGH 


-1 


q^O 


( 11 ) 

This last expression can be simplified by using the Leib¬ 
niz Rule for differentiation, so that 


(«t) 


1 


PniX-ex,). 


e=i 


( 12 ) 


This provides an exact expression for the average number 
of parts. 

To proceed, we use a well-known approximation [191 
I20| for the restricted partition number 


Ph{X) 


1 X^-^ 

{N — 1)1 X2 ■ ■ ■ xn^ 


(13) 


valid for large X. Substituting this approximation into 
Eq. (121, removing the floor function and replacing the 
summations by integrals, we find 


{Uj) 


1 r^lxj 

iX-zx,)^-^dz 
1 jX-x,)^ 

Nxj 

^[1 + OiX-^)]. (14) 


A rigorous calculation (including the dependence of the 
error term on the part set H), found by bounding the 
partition number more precisely, is presented in the Ap¬ 
pendix. 


IV. BENFORD’S LAW 


As described above, power laws, such as Zipf’s law, or 
other “fat” or “long-tailed” distributions have been stud¬ 
ied intensively [5] . Here we have found the simplest power 
law for the number of parts of a given size, in the equipar- 
tition limit Eq. §. Note that the power law is for the 
average number of parts, as opposed to the probability 
distribution for the number of each part (which is expo¬ 
nential). In some sense, this can be seen as the simplest 
possible distribution, as only one conservation constraint 
has been imposed on the number of parts. Most impor¬ 
tantly, this simplest power law leads directly to Benford’s 
law [10] . which we reproduce here for completeness. 

Specifically, if we extend the equipartition result Eq. 
([^ to a system in which we can sample from a continuous 
set of pieces of size x, each occurring with probability 
proportional to l/x, the expected digit distribution (over 
any interval 10^ —?► in a;) will be Benford: 


Pd 


Hd-|-l)10P 

JdlQP 


dxjx 


^lOP+i 

JlOP 


dxjx 


ln(l + 1/d) 
In 10 



(15) 

We note that other long-tailed distributions may exhibit 
Benford-like behavior [23] . and thus many of the distri¬ 
butions recently studied HHS] may also be candidates to 
describe how Benford-like data sets emerge, but the in¬ 
verse power law shown here uniquely leads to the exact 
Benford distribution. 


As an example of an almost Benford distribution, we 
consider the the maximum entropy result Eq. (|^, for 
continuous x. Eor this distribution we can perform a 
similar calculation to find P^, and find that 


Pd = 


In [(1 — e ^(‘^+i)io^)(l — e ddwp-^ ij 
In [(1 - e-/3io^+^)(l - e-Z^iO")-!] 

1 


logio 1 + 3 


+/3 


- d 
IQP 


2 In 10 


9 logio 1 + 3-1 


for /3 ^ 10 

(16) 


so that, in the equipartition limit /3 —>■ 0, we recover the 
Benford digit distribution. Eor large (3, the digit distri¬ 
bution tends to the exponential form 
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V. BEYOND BENFORD: GENERAL POWER 
LAWS 


We now consider an extension of the maximum entropy 
principle to allow for arbitrary power-law distributions, 
along with natural cutoffs. First, we modify the conser¬ 
vation law of Eq. Q to a more general form 

N 

(17) 

i=i 


This equation can be interpreted geometrically, so that 
a = 1 is like the partitioning of a line, a = 2 the partition¬ 
ing of an area, and general a would correspond to more 
graph-like fractal geometries. In addition to this gener¬ 
alization, we add a chemical potential /r, corresponding 
to an average total number of fragments A/" = (ni H— •): 


S = - J^p(n) In p(n) - a I J^p(n) - 1 

n \ n 

y n _ j 


(18) 


Maximizing this entropy yields the probability distri¬ 
bution 


N 


p{n) = 

i=i 




(19) 


When we calculate the fragment distribution for this 
probability, we now find 


= 


1 




- 1 


( 20 ) 


This function, a generalized distribution n{x) for frag¬ 
ments of size cc, can be used to study many empirical 
data sets with power-law regions. Specifically, this func¬ 
tion has the following properties, as shown in Fig. 2. 
First, when x Xi = the fragment distribu¬ 

tion becomes a constant n{x) « l/(e^ — 1), which will be 
finite for p > 0. Second, for xi < x < X 2 = 
n(x) is approximately a power law n(x) ^ x~°‘. Fi¬ 
nally, for X ^ X 2 , the distribution falls exponentially 
n(x) « . Thus, this is a normalizable distribution 

with three characteristic regions generalizing both the 
exponential and power-law distributions. 

This generalized power-law distribution has many sim¬ 
ilarities to those derived from maximum entropy subject 
to alternative constraints [3H5]. However, instead of di¬ 
rectly constructing a probability distribution, we use the 
probability of obtaining a particular part set [given by 
Eq. (19)] to find the average number of parts [given by 



FIG. 2: Example fragment distributions using the generalized 
power law Eq. (201, with x\ = 1, X 2 = 1000 (see text) and a = 
1 (black) and a = 2 (gray). Also shown are the corresponding 
power laws (dashed). 


Eq. (20)]. It is the latter which produces a generalized 
power law. This alternative route to generalized power 
laws may be appropriate for data found by averaging over 
many realizations. 


VI. CONCLUSION 

In this paper we have explored the question of the dis¬ 
tribution of numbers arising from the partitioning of a 
quantity X into a set of pieces xj. We have found, using 
maximum entropy and exact counting, that the average 
number of parts of size Xj tends to the equipartition re¬ 
sult (rij) = X/{Nxj) when A 1. This result is inti¬ 
mately related to the statistical mechanics of quantum 
oscillators and their high-temperature limit. Extending 
this result to a continuous set of parts provides an at¬ 
tractive route to Benford’s law, an empirical observa¬ 
tion regarding the first digits of many real-world data 
sets. Finally, this type of model can be used to generate 
long-tailed distributions using a small number of param¬ 
eters, also relevant to many real-world data sets. Here 
we consider the open questions regarding the application 
to Benford’s law. 

First, the limiting process to go from a discrete set of 
N parts to a continuous set requires us to specify how 
both X and N tend to infinity. The rigorous bounds de¬ 
rived in the Appendix require that X ^ N'^xn, while 
careful analysis of the maximum entropy result suggests 
that a weaker condition of A ^ Nxn may be possible. 
Understanding exactly when the equipartition result and 
Benford’s law is truly applicable remains an open ques¬ 
tion. This is relevant to whether the model presented 
here is truly applicable to real-world data sets such as 
the division of large population (A) into groups of vari¬ 
ous sizes ({xj}). Second, the character of the generalized 
power laws remains obscure. It would be nice to have a 
more physical interpretation of the conservation law, and 
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whether it has connection to the other generalized power 
laws discussed in the literature [2H5]. The variety of these 
results suggests that there may be many processes un¬ 
derlying these distributions. This raises a final question, 
whether the fully random partitioning considered here 
corresponds to any real-world process. 

Regarding this final point, we have recently ana¬ 
lyzed multiple random fragmentation scenarios for a one¬ 
dimensional object, similar to those studied in [Ml I25j . 
and find that the resulting fragments obey Benford’s law 
in the long-time limit |M]. The convergence of such a 
process is currently under investigation. We hope that 
continued statistical analysis of these problems will help 
shed light on how power laws and Benford’s law emerge 
in such varied phenomena in nature. 
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where we have used the fact that [LJ > L — 1. It follows, 
then, that 

pLn pLn-i pL2 

Ph{X) > / / ... dn2---dnM. (A.5) 

Jn^—O J n2—0 


It is fairly straightforward to integrate this expression, 
using a recursion relation [from Eq. (A.2)] 


Lh — — —{Lk+i — Tik+i), k — 2 ^ N — 1, (A.6) 
Xk 


to find 


Ph{X) > 


X 


N-l 


{N — I)! X2 - ■ ■ xn 


(A.7) 


For the upper bound, we again convert our sums into 
integrals. Here, however, we use the alternative inequal¬ 
ity (A.12 in [27] 1 
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Appendix: Partition Nnmber Bounds 

In this Appendix, we provide bounds on the partition 
number Ph{X) and the average number of parts {rij). 
We begin by observing that the exact restricted partition 
number can be written as an explicit sum over all possible 
partitions: 

[LatJ [Liv-iL L-^sJ L-^iJ / \ 

ph{x)^Y. e •••e I 

TiN—O riN-i—O 712—0 ni—0 \ h£H / 

(A.l) 

where the upper limits denote the maximum num¬ 
ber of Uk that can be subtracted from the remainder of 
X, namely 

— k-\-l 

Lk = - ^ + (A.2) 

Xk 

For this calculation, we will only consider those sets H 
such that xi = 1. This ensures that a partition will exist 
for every X and allows us to sum over the delta function. 
We then can disregard the sum over ni , as once the other 
rij are determined, it will only have one possible value. 
Thus we consider 

[LjvJ [Ln-i] 1 ^ 2 ] 

Ph{X)=Y2 E •••El- (A-3) 

riN—O riN-i—O 712—0 


LG 


'LG 


71—0 


f{n)dn 

< J f{n)dn 

rL~\-l 

/ f{n'-l)dn\ (A.8) 
Jo 


1-1 

i-L 


where [LJ < L and we have changed variables n' = n -|-1. 
In terms of these variables, we note that 


I / AT TV 

+ 1 = — + E “ E '^'3 

V 

We thus use one more inequality 

I ( 

Lk + I < L'k = — \ X' - ^ n'X 


(A.9) 


Xk 


where 


3=k+l 


N 


A' = A + E^ 


(A.IO) 


(A.Il) 


1=2 


and the equality occurs for k = 2. Altogether we find 


PniX) < 




pL' 


'"Lv =0 "'"kr-i =0 


dn2 ■ ■ • dn'jki ■ 


(A.I2) 

These integrals can be evaluated as in the lower bound 
case to yield 


We now find upper and lower bounds for Ph{X). 

We begin with a lower bound. Given that our sum¬ 
mands f{n,j) are all positive and non-increasing, we have 
the inequality (A. 12 in [27] ! 


A1 rL-^J+i fk 

V f{n) > / /(n)dn > / f{n)dn, (A.4) 

n=o -lo -lo 


Ph{X) < 


1 x'^-^ 

(N — 1)! X2 - ■ - Xn 

1 1 
{N - 1)! X2 • • • xjv 



AT-l 


(A.13) 








6 


Having bounded the partition number, we can provide 
upper and lower bounds for (n^), using 


where we have used the inequalities (1 + a;) ^ > 1 — x 
and Eq. (A.4| for the summation. 


(rij) = 


1 


VXIx,\ 


Ph{X) 


Ph{X-£x,). 


(A.14) 


To get an upper bound for {nj), we use the lower bound 
for Ph{X) in the denominator and the upper bound for 
Ph{X) in the sum. Again, using Eqs. (A.7) and (A.13) 
in Eq. (|A.14), we have 


To get a lower bound for (n^), we use the upper bound 
for Ph{X) in the denominator and the lower bound for 
Ph{X) in the sum. Using Eqs. (|A.7|) and (|A.13|) in Eq. 
piiit , we get 


^ wAf-i 


lX/x,\ 


N-1 


1 


[X/x,\ 


N-1 


(^i) ^ {X f-Xj) 


< 


XN 

X 

Nxi 


N 


N 


+ ’ (A.16) 


k=2 


N 


(A.15) 


where here we have used the inequality of Eq. (A.8) for 
the summation. 


Taking Equations (A.15) and (A.16) together, we conclude that 

N 




N 


Nx 




(A.17) 




In the large X limit, we Taylor expand each side of Eq. (A.17) to find 

X (^ Nx, N ^ \ ^ X ( Nx, N^ 

+ ^ ■ I <(">>< + + 


Nx, 


k=2 


k=2 


We conclude that 


A 


W = ]^[i + o(v-‘)] 


(A. 18 ) 


in agreement with the calculations presented in the text. 
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